Language Syntax in a Text Recognition Algorithm
نویسنده
چکیده
A Markov model for language syntax and its use in a text recognition algorithm is proposed. Syntactic constraints are described by the transition probabilities between classes. The confusion between the feature string for a word and the syntactic classes is also described probabilistically. A modification of the Viterbi algorithm is also proposed that finds a fixed number of sequences of syntactic classes for a given sentence that have the highest probabilities of occurrence, given the feature strings for the words. An experimental application of this approach is demonstrated with a word hypothesization algorithm that produces a number of guesses about the identity of each word in a running text. It is shown that the Viterbi algorithm can significantly reduce the number of words that can possibly match an image.
منابع مشابه
Incorporation of a Mark~v Model of Language Syntax in a Text Recognition Algorithm
A Markov model for language syntax an~ its use in a text recognition algorithm is proposed. Syntactic constraints are described by the transition probabilities between classes. The confusion between the feature string for a word and the syritactic classes is also described probabilistic ally. A modification of the Viterbi algorithm is also pr0posed that finds a fixed number of sequences of synt...
متن کاملA Hidden Markov Model for Language Syntax in Text Recognition
The use of a hidden Markov model (HMM) for language syntax to improve the performance of a text recognition algorithm is proposed. Syntactic constraints are described by the transition probabilities between word classes. The confusion between the feature string for a word and the various syntactic classes is also described probabilistically. A modification of the Viterbi algorithm is also propo...
متن کاملA new model for persian multi-part words edition based on statistical machine translation
Multi-part words in English language are hyphenated and hyphen is used to separate different parts. Persian language consists of multi-part words as well. Based on Persian morphology, half-space character is needed to separate parts of multi-part words where in many cases people incorrectly use space character instead of half-space character. This common incorrectly use of space leads to some s...
متن کاملDetection and Recognition of Multi-language Traffic Sign Context by Intelligent Driver Assistance Systems
Design of a new intelligent driver assistance system based on traffic sign detection with Persian context is concerned in this paper. The primary aim of this system is to increase the precision of drivers in choosing their path with regard to traffic signs. To achieve this goal, a new framework that implements fuzzy logic was used to detect traffic signs in videos captured along a highway f...
متن کاملبهبود شناسایی موجودیتهای نامدار فارسی با استفاده از کسره اضافه
Named entity recognition is a process in which the people’s names, name of places (cities, countries, seas, etc.) and organizations (public and private companies, international institutions, etc.), date, currency and percentages in a text are identified. Named entity recognition plays an important role in many NLP tasks such as semantic role labeling, question answering, summarization, machine ...
متن کامل